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Abstract 



This paper provides a nonparametric analysis for several classes of mod- 
els, with cases such as classical measurement error, regression with errors in 
variables, and other models that may be represented in a form involving con- 
volution equations. The focus here is on conditions for existence of solutions, 
nonparametric identification and well-posedness in the space S* of general- 
ized functions (tempered distributions) . This space provides advantages over 
working in function spaces by relaxing assumptions and extending the results 
to include a wider variety of models, for example by not requiring existence 
of density. Classes of (generalized) functions for which solutions exist are 
defined; identification conditions, partial identification and its implications 
are discussed. Conditions for well-posedness are given and the related issues 
of plug-in estimation and regularization are examined. 

1 Introduction 

Many statistical and econometric models involve independence (or condi- 
tional independence) conditions that can be expressed via convolution. Ex- 
amples are independent errors, classical measurement error and Berkson er- 
ror, regressions involving data measured with these types of errors, common 
factor models and models that conditionally on some variables can be repre- 
sented in similar forms, such as a nonparametric panel data model with errors 
conditionally on observables independent of the idiosyncratic component. 
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Although the convolution operator is well known, this paper provides ex- 
plicitly convolution equations for a wide list of models for the first time. In 
many cases the analysis in the literature takes Fourier transforms as the start- 
ing point, e.g. characteristic functions for distributions of random vectors (as 
in the famous Kotlyarski lemma, 1967). The emphasis here on convolution 
equations for the models provides the opportunity to explicitly state non- 
parametric classes of functions defined by the model for which such equations 
hold, in particular, for densities, conditional densities and regression func- 
tions. The statistical model may give rise to different systems of convolution 
equations and may be over-identified in terms of convolution equations; some 
choices may be better suited to different situations, for example, here in Sec- 
tion 2 two sets of convolution equations (4 and 4a in Table 1) are provided for 
the same classical measurement error model with two measurements; it turns 
out that one of those allows to relax some independence conditions, while the 
other makes it possible to relax a support assumption in identification. Many 
of the convolution equations derived here are based on density- weighted con- 
ditional averages of the observables. 

The main distinguishing feature is that here all the functions defined 
by the model are considered within the space of generalized functions S*, 
the space of so-called tempered distributions (they will be referred to as 
generalized functions). This is the dual space, the space of linear continuous 
functionals, on the space S of well-behaved functions: the functions in S are 
infinitely differentiable and all the derivatives go to zero at infinity faster 
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than any power. An important advantage of assuming the functions are 
in the space of generahzed functions is that in that space any distribution 
function has a density (generahzed function) that continuously depends on 
the distribution function, so that distributions with mass points and fractal 
measures have well-defined generahzed densities. 

Any regular function majorized by a polynomial belongs to S*; this in- 
cludes polynomially growing regression functions and binary choice regres- 
sion as well as many conditional density functions. Another advantage is 
that Fourier transform is an isomorphism of this space, and thus the usual 
approaches in the literature that employ characteristic functions are also 
included. Details about the space S* are in Schwartz (1966) and are sum- 
marized in Zinde- Walsh (2012). 

The model classes examined here lead to convolution equations that are 
similar to each other in form; the main focus of this paper is on existence, 
identification, partial identification and well-posedness conditions. Existence 
and uniqueness of solutions to some systems of convolution equations in the 
space S* were estabhshed in Zinde- Walsh (2012). Those results are used 
here to state identification in each of the models. Identification requires 
examining support of the functions and generalized functions that enter into 
the models; if support excludes an open set then identification at least for 
some unknown functions in the model fails, however, some isolated points 
or lower-dimensional manifolds where the e.g. the characteristic function 
takes zero values (an example is the uniform distribution) does not preclude 
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identification in some of the models. This point was made in e.g. Carrasco 
and Florens (2010), Evdokimov and White (2011) and is expressed here in 
the context of operating in S* . Support restriction for the solution may imply 
that only partial identification will be provided. However, even in partially 
identified models some features of interest (see, e.g. Matzkin, 2007) could 
be identified thus some questions could be addressed even in the absence 
of full identification. A common example of incomplete identification which 
nevertheless provides important information is Gaussian deconvolution of a 
blurred image of a car obtained from a traffic camera; the filtered image is 
still not very good, but the licence plate number is visible for forensics. 

Well-posedness conditions are emphasized here. The well-known defini- 
tion by Hadamard (1923) defines well-posedness via three conditions: exis- 
tence of a solution, uniqueness of the solution and continuity in some suitable 
topology. The first two are essentially identification. Since here we shall be 
defining the functions in subclasses of S* we shall consider continuity in the 
topology of this generalized functions space. This topology is weaker than 
the topologies in functions spaces, such as the uniform or Lp topologies; thus 
differentiating the distribution function to obtain a density is a well-posed 
problem in S*, by contrast, even in the class of absolutely continuous distri- 
butions with uniform metric where identification for density in the space Li 
holds, well-posedness however does not obtain (see discussion in Zinde- Walsh, 
2011). But even though in the weaker topology of S* well-posedness obtains 
more widely, for the problems considered here some additional restrictions 
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may be required for well-posedness. 

Well-posedness is important for plug-in estimation since if the estimators 
are in a class where the problem is well-posed they are consistent, and con- 
versely, if well-posedness does not hold consistency will fail for some cases. 
Lack of well-posedness can be remedied by rcgularization, but the price is 
often more extensive requirements on the model and slower convergence. For 
example, in deconvolution (see e.g. Fan, 1991, and most other papers cited 
here) spectral cut-off regularization is utilized; it crucially depends on know- 
ing the rate of the decay at infinity of the density. 

Often non-parametric identification is used to justify parametric or semi- 
parametric estimation; the claim here is that well-posedness should be an 
important part of this justification. The reason for that is that in estimat- 
ing a possibly misspecified parametric model, the misspecified functions of 
the observables belong in a nonparametric neighborhood of the true func- 
tions; if the model is non-parametrically identified, the unique solution to 
the true model exists, but without well-posedness the solution to the para- 
metric model and to the true one may be far apart. 

For deconvolution An and Hu (2012) demonstrate well-posedness in spaces 
of integrable density functions when the measurement error has a mass point; 
this may happen in surveys when probability of truthful reporting is non- 
zero. The conditions for well-posedness here are provided in 5**; this then 
additionally does not exclude mass points in the distribution of the mis- 
measured variable itself; there is some empirical evidence of mass points in 
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earnings and income. The results here show that in S* well-posedness holds 
more generally: as long as the error distribution is not super-smooth. 

The solutions for the systems of convolution equations can be used in 
plug-in estimation. Properties of nonparametric plug-in estimators are based 
on results on stochastic convergence in S* for the solutions that are stochastic 
functions expressed via the estimators of the known functions of the observ- 
ables. 

Section 2 of the paper enumerates the classes of models considered here. 
They are divided into three groups: 1. measurement error models with classi- 
cal and Berkson errors and possibly an additional measurement, and common 
factor models that transform into those models; 2. nonparametric regression 
models with classical measurement and Berkson errors in variables; 3. mea- 
surement error and regression models with conditional independence. The 
corresponding convolution equations and systems of equations are provided 
and discussed. Section 3 is devoted to describing the solutions to the convolu- 
tion equations of the models. The main mathematical aspect of the different 
models is that they require solving equations of a similar form. Section 4 pro- 
vides a table of identified solutions and discusses partial identification and 
well-posedness. Section 5 examines plug-in estimation. A brief conclusion 
follows. 
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2 Convolution equations in classes of models 
with independence or conditional indepen- 
dence 

This section derives systems of convolution equations for some important 
classes of models. The first class of model is measurement error models with 
some independence (classical or Berkson error) and possibly a second mea- 
surement; the second class is regression models with classical or Berkson type 
error; the third is models with conditional independence. For the first two 
classes the distributional assumptions for each model and the corresponding 
convolution equations are summarized in tables; it is indicated which of the 
functions are known and which unknown; a brief discussion of each model 
and derivation of the convolution equations follows. The last part of this sec- 
tion discusses convolution equations for two specific models with conditional 
independence; one is a panel data model studied by Evdokimov (2011), the 
other a regression model where independence of measurement error of some 
regressors obtains conditionally on a covariate. 

The general assumption made here is that all the functions in the convo- 
lution equations belong to the space of generalized functions S*. 

Assumption 1. All the functions defined by the statistical model are in 
the space of generalized functions S*. 

This space of generalized function includes functions from most of the 
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function classes that are usually considered, but allows for some useful gen- 
eralizations. The next subsection provides the necessary definitions and some 
of the implications of working in the space S*. 

2.1 The space of generalized functions S*. 

The space S* is the dual space, i.e. the space of continuous hnear function- 
al on the space S of functions. The theory of generalized functions is in 
Schwartz (1966); relevant details are summarized in Zinde- Walsh (2012). In 
this subsection the main definitions and properties are reproduced. 
Recall the definition of S. 

For any vector of non-negative integers m = (mi, ...m^) and vector t & 
denote by the product t^'^ ...f^"^ and by the differentiation operator 
'q^---^^'-i is the space of infinitely differentiable (real or complex- 
valued) functions on R^. The space S C Coo of test functions is defined 
as: 

{ije CooiR'^) : 9^(^)1 = o(l) ast^oo}, 

for any k — {ki,...kd),l — (/i,...^^), where k — (0, ...0) corresponds to the 
function itself, t ^ oo coordinate- wise; thus the functions in S go to zero 
at infinity faster than any power as do their derivatives; they are rapidly 
decreasing functions. A sequence in S converges if in every bounded region 
each \t^d''ip{t)\ converges uniformly. 

Then in the dual space S* any b e S* represents a hnear functional on 
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S; the value of this functional for e is denoted by (6, ip) . When b is an 
ordinary (point-wise defined) real-valued function, such as a density of an 
absolutely continuous distribution or a regression function, the value of the 
functional on real- valued -0 defines it and is given by 



If 6 is a characteristic function it may be complex- valued, then the value of 
the functional b apphed to ip E S where S is the space of complex-valued 
functions, is 



where overbar denotes complex conjugate. The integrals are taken over the 
whole space R^. 

The generalized functions in the space S* are continuously differentiable 
and the differentiation operator is continuous; Fourier transforms and their 
inverses are defined for all b & S*, the operator is a (continuos) isomorphism 
of the space S*. However, convolutions and products are not defined for all 
pairs of elements of S*, unlike, say, the space Li; on the other hand, in Li 
differentiation is not defined and not every distribution has a density that is 
an element of Li . 

Assumption 1 places no restrictions on the distributions, since in 5"* any 
distribution function is differentiable and the differentiation operator is con- 
tinuous. The advantage of not restricting distributions to be absolutely con- 
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tinuous is that mass points need not be excluded; distributions representing 
fractal measures such as the Cantor distribution are also allowed. This means 
that mixtures of discrete and continuous distributions e.g. such as those ex- 
amined by An and Hu (2012) for measurement error in survey responses, 
some of which may be error-contaminated, but some may be truthful lead- 
ing to a mixture with a mass point distribution are included. Moreover, in 
S* the case of mass points in the distribution of the mismeasured variable 
is also easily handled; in the literature such mass points are documented 
for income or work hours distributions in the presence of rigidities such as 
unemployment compensation rules (e.g. Green and Riddell, 1997). Fractal 
distributions may arise in some situations, e.g. Karlin's (1958) example of 
the equilibrium price distribution in an oligopolistic game. 

For regression functions the assumption g & S* implies that growth at 
infinity is allowed but is somewhat restricted. In particular for any ordinary 
point-wise defined function b & S* the condition 

J... I Ill,{{l + t^yY^m\dt,...dta<oc, (1) 

needs to be satisfied for some non-negative valued rrii, ...,771^. If a locally 
integrable function g is such that its growth at infinity is majorized by a 
polynomial, then b = g satisfies this condition. While restrictive this still 
widens the applicability of many currently available approaches. For example 
in Berkson regression the common assumption is that the regression function 
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be absolutely integrable (Meister, 2009); this excludes binary choice, hnear 
and polynomial regression functions that belong to S* and satisfy Assump- 
tion 1. Also, it is advantageous to allow for functions that may not belong to 
any ordinary function classes, such as sums of (5— functions ("sum of peaks") 
or (mixture) cases with sparse parts of support, such as isolated points; such 
functions are in S*. Distributions with mass points can arise when the re- 
sponse to a survey questions may be only partially contaminated; regression 
"sum of peaks" functions arise e.g. in spectroscopy and astrophysics where 
isolated point supports are common. 

2.2 Measurement error and related models 

Current reviews for measurement error models are in Carrol et al, (2006), 
Chen et al (2011), Meister (2009). 

Here and everywhere below the variables assumed to be 

in R^ll/jV are in R^; all the integrals are over the corresponding space; density 
of u for any u is denoted by /.„; independence is denoted by ±; expectation 
of X conditional on z is denoted by E{x\z). 

2.2.1 List of models and corresponding equations 

The table below lists various models and corresponding convolution equa- 
tions. Many of the equations are derived from density weighted conditional 
expectations of the observables. 
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Recall that for two functions, / and g convolution f * g is defined by 



this expression is not always defined. A similar expression (with some abuse 
of notation since generalized functions are not defined pointwise) may hold 
for generalized functions in S*; similarly, it is not always defined. With As- 
sumption 1 for the models considered here we show that convolution equa- 
tions given in the Tables below hold in S*. 

Table 1. Measurement error models: 1. Classical measurement error; 2. 
Berkson measurement error; 3. Classical measurement error with additional 
observation (with zero conditional mean error); 4., 4a. Classical error with 
additional observation (full independence). 
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Model 


Distributional 
assumptions 


Convolution 
equations 


Known 
functions 


Unknown 
functions 


1. 


Z — X* + u 

x*±u 


fx* * fu = fz 


fzi fu 


fx* 


2. 


Z — X* +u 

zJ-U 


fz * f-u — fx* 


fzt fu 


fx* 


3. 


z — X* + u; 

X — X 1 

x*±u; 
E{ux\x*, u) = 0; 
£'11^11 <oo;£'||m|| <oo. 


fx* * fu — fz] 

hk* fu = Wk, 
with hk{x) = Xkfx*{x); 
k^l,2...d 


fz,Wk, 

k^ 1,2. ..d 


fx* -1 fu 


4. 


z — X* + u; 

X — X 1 '^J-'x 1 ^ -\—'U^ 

x*±u^; E{u^) = 0; 
E \\z\\ < oo; E \\u\\ < oo. 


fx* * fu — fz] 

hk* fu = Wk] 
fx* * fux — fx] 

with hk{x) = Xkfx*ix)] 
k = l,2...d 


fz, fx]w;wk 
k^l,2...d 


fx* 1 ful fux 


4a. 


Same model as 4., 
alternative 
equations: 


fx* * fu = fz] 

fu. * f-u = w; 
hk * f-u = Wk, 
with hk{x) = XkfuA^)] 
k^l,2...d 




5) 



Notation: k = 1,2, ...,d; in 3. and 4, Wk = E{xkfz{z)\z); in 4a w = 



fz-x]Wk = E{xkw{z -x)\{z- x)). 
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Theorem 1. Under Assumption 1 for each of the models 1-4 the corre- 
sponding convolution equations of Table 1 hold in the generalized functions 
space S*. 

The proof is in the derivations of the following subsection. 

Assumption 1 requires considering all the functions defined by the model 
as elements of the space S*, but if the functions (e.g. densities, the con- 
ditional moments) exist as regular functions, the convolutions are just the 
usual convolutions of functions, on the other hand, the assumption allows to 
consider convolutions for cases where distributions are not absolutely contin- 
uous. 

2.2.2 Measurement error models and derivation of the correspond- 
ing equations. 

1. The classical measurement error model. 

The case of the classical measurement error is well known in the literature. 
The concept of error independent of the variable of interest is applicable to 
many problems in seismology, image processing, where it may be assumed 
that the source of the error is unrelated to the signal. In e.g. Cunha et 
al. (2010) it is assumed that some constructed measurement of ability of 
a child derived from test scores fits into this framework. As is well-known 
in regression a measurement error in the regressor can result in a biased 
estimator (attenuation bias). 
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Typically the convolution equation 



fx* * fu — fz 

is written for density functions when the distribution function is absolutely 
continuous. The usual approach to possible non-existence of density avoids 
considering the convolution and focuses on the characteristic functions. Since 
density always exists as a generalized function and convolution for such gen- 
eralized functions is always defined it is possible to write convolution equa- 
tions in S* for any distributions in model 1. The error distribution (and thus 
generalized density /„) is assumed known thus the solution can be obtained 
by "deconvolution" (Carrol et al (2006), Meister (2009), the review of Chen 
et al (2011) and papers by Fan (1991), Carrasco and Florens(2010) among 
others) . 

2. The Berkson error model. 

For Berkson error the convolution equation is also well-known. Berkson 
error of measurement arises when the measurement is somehow controlled 
and the error is caused by independent factors, e.g. amount of fertilizer 
applied is given but the absorption into soil is partially determined by factors 
independent of that, or students' grade distribution in a course is given in 
advance, or distribution of categories for evaluation of grant proposals is 
determined by the granting agency. The properties of Berkson error are very 
different from that of classical error of measurement, e.g. it does not lead to 
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attenuation bias in regression; also in the convolution equation the unknown 
function is directly expressed via the known ones when the distribution of 
Berkson error is known. For discussion see Carrol et al (2006), Meister (2009), 
and Wang (2004). 

Models 3. and 4. The classical measurement error with another observa- 
tion. 

In 3., 4. in the classical measurement error model the error distribution is 
not known but another observation for the mis-measured variable is available; 
this case has been treated in the literature and is reviewed in Carrol et al 
(2006), Chen et al (2011). In econometrics such models were examined by Li 
and Vuong (1998), Li (2002), Schennach (2004) and subsequently others (see 
e.g. the review by Chen et al, 2011). In case 3 the additional observation 
contains an error that is not necessarily independent, just has conditional 
mean zero. 

Note that here the multivariate case is treated where arbitrary depen- 
dence for the components of vectors is allowed. For example, it may be of 
interest to consider the vector of not necessarily independent latent abilities 
or skills as measured by different sections of an IQ test, or the GRE scores. 

Extra measurements provide additional equations. Consider for any k — 
l,...d the function of observables Wk defined by density weighted expecta- 
tion E{xkfz{z)\z) as a generalized function; it is then determined by the 
values of the functional {wk,il^) for every & S. Note that by assumption 
E{xkfz{z)\z) = E{xlfz{z)\z); then for any ij} & S the value of the functional: 
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// 




The third expression is a double integral which always exists if ^ ||a:*|| < 
oo; this is a consequence of boundedness of the expectations of z and u. The 
fourth is a result of change of variables (a;*, 2;) into the fifth uses 

independence of a;*and u, and the sixth expression follows from the corre- 
sponding expression for the convolution of generalized functions (Schwartz, 
1967, p. 246). The conditions of model 3 are not sufficient to identify the 
distribution of u^] this is treated as a nuisance part in model 3. 

The model in 4 with all the errors and mis-measured variable independent 
of each other was investigated by Kotlyarski (1967) who worked with the joint 
characteristic function. In 4 consider in addition to the equations written for 
model 3 another that uses the independence between x* and and involves 



In representation 4a the convolution equations involving the density 
are obtained by applying the derivations that were used here for the model 



in 3.: 



z = 



X* + u; 



X — X* + u. 



'XI 
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to the model in 4 with x — z playing the role of z, playing the role of 
X*, —u playing the role of u, and x* playing the role of Ux- The additional 
convolution equations arising from the extra independence conditions provide 
extra equations and involve the unknown density This representation 
leads to a generalization of Kotlyarski's identification result similar to that 
obtained by Evdokimov (2011) who used the joint characteristic function. 
The equations in 4a make it possible to identify fu, fu^ ahead of fx*; for 
identification this will require less restrictive conditions on the support of 
the characteristic function for x*. 

2.2.3 Some extensions 

A. Common factor models. 

Consider a model z = AU, with A a matrix of known constants and z a 
m X 1 vector of observables, U a vector of unobservable variables. Usually, A 
is a block matrix and AU can be represented via a combination of mutually 
independent vectors. Then without loss of generality consider the model 

z = Ax* + u, (2) 

where A is a m x d known matrix of constants, 5 is a m x 1 vector of ob- 
servables, unobserved x* is d x 1 and unobserved -u is m x 1. If the model 
(j2]) can be transformed to model 3 considered above, then x* will be identi- 
fied whenever identification holds for model 3. Once some components are 
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identified identification of other factors could be considered sequentially. 

Lemma 1. If in the vectors x* and u are independent and all the 
components of the vector u are mean independent of each other and are mean 
zero and the matrix A can he partitioned after possibly some permutation of 

with rankAi = rankA2 = d, then the model ([2]) implies 



rows as 



model 3. 



Proof. Define z = Tiz. where conformably to the partition of A the 

, with TiAix* = X* (such a Ti always exists by the 



partitioned Ti 



\ ° / 



rank condition); then z = x* + u, where u = Tiu is independent of x*. Next 



define To 



similarly with T2A2X* = x* 



Then x = T2Z is such that x = x* + Ux, where Ux = T2U and does not 
include any components from u. This implies Eux\{x* , u) = 0. Model 3 holds. 



Here dependence in components of x* is arbitrary. A general structure 
with subvectors of U independent of each other but with components which 
may be only mean independent (as u here) or arbitrarily dependent (as in 
X*) is examined by Ben-Moshe (2012). Models of linear systems with full 
independence were examined by e.g. Li and Vuong (1998). These models lead 
to systems of first-order differential equations for the characteristic functions. 

It may be that there are no independent components x* and u for which 
the conditions of Lemma 1 are satisfied. Bonhomme and Robin (2010) pro- 
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posed to consider products of the observables to increase the number of equa- 
tions in the system and analyzed conditions for identification; Ben-Moshe 
(2012) provided necessary and sufficient conditions under which this strat- 
egy leads to identification when there may be some dependence. 

B. Error correlations with more observables. 

The extension to non-zero E[ux\z) in model 3 is trivial if this expectation 
is a known function. A more interesting case results if the errors Ux and u 
are related, e.g. 

Ux — pu + rj; rj±z. 

With an unknown parameter (or function of observables) p if more obser- 
vations are available more convolution equations can be written to identify 
all the unknown functions. Suppose that additionally a observation y is 
available with 

y ^ X* + Uy] 
Uy = pux + r]^;r]^±,r],z. 

Without loss of generality consider the univariate case and define Wx — 
E{xf{z)\z)] Wy = E{yf{z)\z). Then the system of convolution equations ex- 
pands to 
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f 



fx* * fl 



u 



= W\ 



< {I- p)K'* fu +Pzf{z) 



= w. 



(3) 



= W; 



The three equations have three unknown functions, fx* , fu and p. Assum- 
ing that support of p does not include the point 1, p can be expressed as a 
solution to a linear algebraic equation derived from the two equations in ([3]) 
that include p : 



2.3 Regression models with classical and Berkson er- 
rors and the convolution equations 
2.3.1 The list of models 

The table below provides several regression models and the corresponding 
convolution equations involving density weighted conditional expectations. 

Table 2. Regression models: 5. Regression with classical measurement error 
and an additional observation; 6. Regression with Berkson error (x, y, z are 
observable); 7. Regression with zero mean measurement error and Berkson 

instruments. 



p={wx- zf{z)) ^ {Wy - Wx) . 
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Model 


Distributional 
assumptions 


Convolution 
equations 


Known 
functions 


Unknown 
functions 


5. 


y = g{x*) +v 
z = X* -\-u; 

X = X* + Ux 

x*l.u] E{u) = 0; 

E{uj\x*, u) = 0; 
E{v\x*, u, Ux) = 0. 


fx* * fu — fz'i 

(gfx*) * fu^w, 
hk* fu = Wk] 
with hk{x) = Xkg{x)fx*{x); 
k^l,2...d 


fz; w;wk 


fx*i ful g- 


6. 


y = g{x) + v 

z = X + u; E{v\z) = 0; 
z±u;E{u) = 0. 


fx — f-u * fz'-i 

g* f-u^w 


fz;fx,w 


fu] g- 


7. 


y^g(x*)+v; 

00 — CC 1 7 

z — X* + u] z±u; 
E{v\z,u,Ux) = 0; 
E{u^\z,v) = 0. 


g* fu^w; 

hk* fu^ Wk, 
with hk{x) = Xkg{x); 
k= l,2...d 


W,Wk 


fu] g- 



Notes. Notation: k — 1, 2... (i; in model 5. w = E{yfz{z)\z);Wk — E{xkfz{z)\z); 



in model 6. w — E{y\z); in model 7. w — E{y\z);wk — E{xky\z). 

Theorem 2. Under Assumption 1 for each of the models 5-1 the corre- 
sponding convolution equations hold. 

The proof is in the derivations of the next subsection. 
2.3.2 Discussion of the regression models and derivation of the 
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convolution equations. 

5. The nonparametric regression model with classical measurement error and 
an additional observation. 

This type of model was examined by Li (2002) and Li and Hsiao (2004); 
the convolution equations derived here provide a convenient representation. 
Often models of this type were considered in semiparametric settings. Bu- 
tucea and Taupin (2008) (extending the earlier approach by Taupin, 2001) 
consider a regression function known up to a finite dimensional parameter 
with the mismeasured variable observed with independent error where the 
error distribution is known. Under the latter condition the model 5 here 
would reduce to the two first equations 

fx* * /« = fz; {gfx*) * fu^w, 

where is known and two unknown functions are g (here nonparametric) 
and fx*. 

The model 5 incorporates model 3 for the regressor and thus the convolu- 
tion equations from that model apply. An additional convolution equation is 
derived here; it is obtained from considering the value of the density weighted 
conditional expectation in the dual space of generalized functions, S*, applied 
to arbitrary ip & S, 

{w,i^) = (E{f(z)y\z),i^) = (E{f(z)g(x*)\z),i^); 
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this equals 




g{x*)fx*,z{^*, z)i){z)dx*dz 
g{x*)fx\u{x* , u)ijj{x* + u)dx*du 



= j g{x*)fx*{x*)fu{u)dx*il;{x* + u)dx*du = {{gfx*) * /«, V^)- 

Conditional moments for the regression function need not be integrable 
or bounded functions of z; we require them to be in the space of generalized 
functions S*. 

6. Regression with Berkson error. 

This model may represent the situation when the regressor (observed) x 
is correlated with the error v, but z is a (vector) possibly representing an 
instrument uncorrelated with the regression error. 

Then as is known in addition to the Berkson error convolution equation 
the equation 

w ^ E{y\z) = E{g{x)\z) = J g{x) ^ J ^i^ - iJ')fu{u)dx = g * fu 

holds. This is stated in Meister (2008); however, the approach there is to 
consider g to be absolutely integrable so that convolution can be defined in 
the Li space. Here by working in the space of generalized functions S* a much 
wider nonparametric class of functions that includes regression functions with 
polynomial growth is allowed. 
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7. Nonparametric regression with error in the regressor, where Berkson 
type instruments are assumed available. 

This model was proposed by Newey (2001), examined in the univarite 
case by Schennach (2007) and Zinde- Walsh (2009), in the multivariate case 
in Zinde- Walsh (2012), where the convolution equations given here in Table 
2 were derived. 

2.4 Convolution equations in models with conditional 
independence conditions. 

All the models 1-7 can be extended to include some additional variables where 
conditionally on those variables, the functions in the model (e.g. conditional 
distributions) are defined and all the model assumptions hold conditionally. 

Evdokimov (2011) derived the conditional version of the model 4 from 
a very general nonparametric panel data model. Model 8 below describes 
the panel data set-up and how it transforms to conditional model 4 and 4a 
and possibly model 3 with relaxed independence condition (if the focus is on 
identifying the regression function). 

Model 8. Panel data model with conditional independence. 

Consider a two-period panel data model with an unknown regression func- 
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tion m and an idiosyncratic (unobserved) a : 

Yii = m{Xii,ai) + Uii; 
Yi2 = m{Xi2,ai) + Ui2. 

To be able to work with various conditional characteristic functions cor- 
responding assumptions ensuring existence of the conditional distributions 
need to be made and in what follows we assume that all the conditional 
density functions and moments exist as generalized functions in S*. 

In Evdokimov (2011) independence (conditional on the corresponding 
period X's) of the regression error from a, and from the X's and error of the 
other period is assumed: 

ft = fuit\xu,ai,Xit^_t),Ui^-t)i'^t\x, •••) = fuit\Xit{Ut\x),t ^ 1,2 

with /.|. denoting corresponding conditional densities. Conditionally on Xi2 — 
Xii = X the model takes the form 4 

Z — X* -\- U] 

X — X I 

with z representing Yi,x representing Y2, x* standing in for m{x, a), u for Ui 
and Ux for U2- The convolution equations derived here for 4 or 4a now apply 
to conditional densities. 
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The convolution equations in 4a are similar to Evdokimov; they allow for 
equations for /„, fu^ that do not rely on f^*. The advantage of those lies 
in the possibility of identifying the conditional error distributions without 
placing the usual non-zero restrictions on the characteristic function of x* 
(that represents the function m for the panel model). 

The panel model can be considered with relaxed independence assump- 
tions. Here in the two-period model we look at forms of dependence that 
assume zero conditional mean of the second period error, rather than full 
independence of the first period error: 

fuil\xa,ai,X,2,Ui2{Ut\x, ■■■) = fuil\xil{Ut\x)] 

E{Ui2\Xii,ai,Xi2,Uii) = 0; 

Then the model maps into the model 3 with the functions in the convolution 
equations representing conditional densities and allows to identify distribu- 
tion of X* (function m in the model). But the conditional distribution of the 
second-period error in this set-up is not identified. 

Evdokimov introduced parametric AR(1) or MA(1) dependence in the 
errors U and to accommodate that extended the model to three periods. 
Here this would lead in the AR case to the equations in ([3]) . 

Model 9. Errors in variables regression with classical measurement error 
conditionally on covariates. 
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Consider the regression model 

y = 9ix*,t) + v, 

with a measurement of unobserved x* given by z — x* + u, with x*J-U 
conditionally on t. Assume that E{u\t) = and that E(v\x*,t) — 0. Then 
redefining all the densities and conditional expectations to be conditional 
on t we get the same system of convolution equations as in Table 2 for 
model 5 with the unknown functions now being conditional densities and the 
regression function, g. 

Conditioning requires assumptions that provide for existence of condi- 
tional distribution functions in S*. 

3 Solutions for the models. 

3.1 Existence of solutions 

To state results for nonparametric models it is important first to clearly 
indicate the classes of functions where the solution is sought. Assumption 1 
requires that all the (generalized) functions considered are elements in the 
space of generahzed functions S*. This imphes that in the equations the 
operation of convolution applied to the two functions from 5'* provides an 
element in the space S*. This subsection gives high level assumptions on 
the nonparametric classes of the unknown functions where the solutions can 
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be sought: any functions from these classes that enter into the convolution 
provide a result in S*. 

No assumptions are needed for existence of convolution and full generality 
of identification conditions in models 1,2 where the model assumptions im- 
ply that the functions represent generahzed densities. For the other models 
including regression models convolution is not always defined in S*. Zinde- 
Walsh (2012) defines the concept of convolution pairs of classes of functions 
in S* where convolution can be applied. 

To solve the convolution equations a Fourier transform is usually em- 
ployed, so that e.g. one transforms generalized density functions into charac- 
teristic functions. Fourier transform is an isomorphism of the space S*. The 
Fourier transform of a generalized function a G S*, Ft{a), is defined as fol- 
lows. For any ip e S, as usual Ft{'ip){s) — J '^{x)e^^^dx] then the functional 
Ft[a) is defined by 

(Ft(a),V^) = (a,Ft(^)). 

The advantage of applying Fourier transform is that integral convolution 
equations transform into algebraic equations when the "exchange formula" 
applies: 

a * 6 = c ^ Ft{a) ■ Ft{h) = Ft{c) . (4) 

In the space of generalized functions S* , the Fourier transform and inverse 
Fourier transform always exist. As shown in Zinde- Walsh (2012) there is a di- 
chotomy between convolution pairs of subspaces in 5"* and the corresponding 
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product pairs of subspaces of their Fourier transforms. 

The classical pairs of spaces (Schwartz, 1966) are the convolution pair 
(S'*,0^) and the corresponding product pair {S*,Om), where is the 
subspace of S* that contains rapidly decreasing (faster than any polynomial) 
generalized functions and Om is the space of infinitely differentiable functions 
with every derivative growing no faster than a polynomial at infinity. These 
pairs are important in that no restriction is placed on one of the generalized 
functions that could be any element of space S*] the other belongs to a space 
that needs to be correspondingly restricted. A disadvantage of the classical 
pairs is that the restriction is fairly severe, for example, the requirement that 
a characteristic function be in Om implies existence of all moments for the 
random variable. Relaxing this restriction would require placing constraints 
on the other space in the pair; Zinde- Walsh (2012) introduces some pairs 
that incorporate such trade-offs. 

In some models the product of a function with a component of the vector 
of arguments is involved, such as d{x) — Xka{x), then for Fourier transforms 
Ft{d) (s) = —i-^Ft{a){s); the multiplication by a variable is transformed 
into {—i) times the corresponding partial derivative. Since the differentia- 
tion operators are continuous in S* this transformation does not present a 
problem. 

Assumption 2. The functions a e A,b e B, are such that {A, B) form 
a convolution pair in S* . 

Equivalently, Ft{a), Ft{h) are in the corresponding product pair of spaces. 
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Assumption 2 is applied to model 1 for a — fx* ,b — f^, to model 2 with 
a — fz,b — fu', to model 3 with a — fx*,b = fu and with a — hk,b = fu, for 
all k — 1, d; to model 4a for a — f^*, or f^^, or /ij. for all k and 6 = /„; to 
model 5 with a — f^*, or gfx*, or /ife/a;* and b — /„; to model 6 with a — f^, 
or g and 6 = /„; to model 7 with a = g or hk and 6 = /„. 

Assumption 2 is a high-level assumption that is a sufficient condition for 
a solution to the models 1-4 and 6-7 to exist. Some additional conditions are 
needed for model 5 and are provided below. 

Assumption 2 is automatically satisfied for generalized density functions, 
so is not needed for models 1 and 2. Denote by D C S* the subset of general- 
ized derivatives of distribution functions (corresponding to Borel probability 
measures in R!^) then in models 1 and 2 A = B = D; and for the character- 
istic functions there are correspondingly no restrictions; denote the set of all 
characteristic functions, Ft (D) C S**, by C. 

Below a (non-exhaustive) list of nonparametric classes of generalized func- 
tions that provide sufficient conditions for existence of solutions to the models 
here is given. The classes are such that they provide minimal or often no 
restrictions on one of the functions and restrict the class of the other in order 
that the assumptions be satisfied. 

In models 3 and 4 the functions hk are transformed into derivatives of 
continuous characteristic functions. An assumption that either the charac- 
teristic function of x* or the characteristic function of u be continuously 
differentiable is sufficient, without any restrictions on the other to ensure 
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that Assumption 2 holds. Define the subset of all continuously differ entiable 
characteristic functions by C^^\ 

In model 5 equations involve a product of the regression function g with 
fx* ■ Products of generalized functions in S* do not always exist and so addi- 
tional restrictions are needed in that model. If g is an arbitrary element of 
S*, then for the product to exist, f^* should be in Om- On the other hand, 
if fx* is an arbitrary generalized density it is sufficient that g and hk belong 
to the space of d times continuously differentiable functions with derivatives 
that are majorized by polynomial functions for gfx*, h^fx* to be elements of 
S*. Indeed, the value of the functional hkfx* for an arbitrary ip e S is defined 

by 

{hkfx*,i^) = {-it j Fx*{x)d^'-'\hk{x)i^{x))dx; 

here F is the distribution (ordinary bounded) function and this integral ex- 
ists because and all its derivatives go to zero at infinity faster than any 
polynomial function. Denote by S^'^ the space of continuously differentiable 
functions g E S* such that the functions hk{x) = Xkg{x) are also continu- 
ously differentiable with all derivatives majorized by polynomial functions. 
Since the products are in 5** then the Fourier transforms of the products 
are defined in S*. Further restrictions requiring the Fourier transforms of 
the products gfx* and hkfx* to be continuously differentiable functions in S* 
would remove any restrictions on for the convolution to exist. Denote the 
space of all continuously differentiable functions in S* by S^^\ 
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If g is an ordinary function that represents a regular element in S* the 
infinite differentiability condition on fx* can be relaxed to simply requiring 
continuous first derivatives. 

In models 6 and 7 if the generalized density function for the error, /„, 
decreases faster than any polynomial (all moments need to exist for that), 
so that /„ e Oq, then g could be any generalized function in S*; this will 
of course hold if /„ has bounded support. Generally, the more moments the 
error is assumed to have, the fewer restrictions on the regression function g are 
needed to satisfy the convolution equations of the model and the exchange 
formula. The models 6, 7 satisfy the assumptions for any error u when 
support of generalized function g is compact (as for the "sum of peaks"), 
then g & E* G S*, where E* is the space of generalized functions with 
compact support. More generally the functions g and all the hk could belong 
to the space of generalized functions that decrease at infinity faster than 
any polynomial, and still no restrictions need to be placed on u. 

Denote for any generalized density function /. the corresponding charac- 
teristic function, Ft{f.), by (/).. Denote Fourier transform of the (generalized) 
regression function g, Ft{g), by 7. 

The following table summarizes some fairly general sufficient conditions 
on the models that place restrictions on the functions themselves or on the 
characteristic functions of distributions in the models that will ensure that 
Assumption 2 is satisfied and a solution exists. The nature of these assump- 
tions is to provide restrictions on some of the functions that allow the others 
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to be completely unrestricted for the corresponding model. 



Table 3. Some nonpar ametric classes of generalized functions for which 
the convolution equations of the models are defined in S*. 



Model 


Sufficient 


assumptions 


1 


no restrictions: 




2 


no restrictions: 






Assumptions A 


Assumptions B 


3 


any cf)^. eC;(j)^e C^^^ 


any 0^ e (7; 0^. G C^^) 


4 


any 0„,,0a;* e C';0„ e C*^^^ 


any c/-^. G (7; G C'(^) 


4a 


any 00.* e e 


any </'„,0„, e C;*/)^* G C^^) 


5 


any g G S*; f^* G Om] fu e 0*c 


any G ^, hk G 5^'^ /„ G 0*c 


6 


any g e S*;fueO*c 


g G 0^; any f^:(t)^eC 


7 


imy (J G S* : /„ G 0}^ 


[1 G Of.: aiR- /„ : ()„ G C 


The nex 


t table states the equations and systems of equations for Fourier 



transforms that follow from the convolution equations. 

Table 4. The form of the equations for the Fourier transforms: 
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Model 



4a 



Eq's for Fourier transforms 



0^* = 0^0_„; 



(0x01 0« ^ek,k^ l,...,d. 



4>x*(t>u = 02 ; 
(0a:Ofc0u = £k,k = 1, 



0«.0n = 02-x; 

(0«Jl0« = £k,k= l,...,d. 

0x*0«. = 0a;- 



i;*0w ~ 0z) 



{Ft {gfx*))'k 0« ^£k,k^ 1, d. 



0^ = (j)-u<f>z; 

Ft{g)4>_^ = e. 



Unknown functions 



0a 



0a 



<i>x*Av, 



■^x* 1 r^ui T^Ux 



Notes. Notation (-j^ denotes the k-th partial derivative of the function. 
The functions £ are Fourier transforms of the corresponding w, and Sk — 
—iFt{wk) defined for the models in Tables 1 and 2. 

Assumption 2 (that is fulfilled e.g. by generalized functions classes of 
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Table 3) ensures existence of solutions to the convolution equations for models 
1-7; this does not exclude multiple solutions and the next section provides a 
discussion of solutions for equations in Table 4. 

3.2 Classes of solutions; support and multiplicity of 
solutions 

Typically, support assumptions are required to restrict multiplicity of solu- 
tions; here we examine the dependence of solutions on the support of the 
functions. The results here also give conditions under which some zeros, e.g. 
in the characteristic functions, are allowed. Thus in common with e.g. Car- 
rasco and Florens (2010), Evdokimov and White (2011), distributions such 
as the uniform or triangular for which the characteristic function has isolated 
zeros are not excluded. The difference here is the extension of the considera- 
tion of the solutions to S* and to models such as the regression model where 
this approach to relaxing support assumptions was not previously considered. 

Recall that for a continuous function ip^x) on i?'' support is defined as 
the set W =supp('0), such that 



Support of a continuous function is an open set. 

Generalized functions are functionals on the space 5* and support of a 
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generalized function b e S* is defined as follows (Schwartz, 1967, p. 28). 
Denote by {b, tp) the value of the functional b ior ip e S. Define a null set for 
6 e 5"* as the union of supports of all functions in S for which the value of 
the functional is zero: Q = {Usupp('0) , ip E S, such that (6, ■0) — 0}. Then 
supp(6) = R^\^l. Note that a generalized function has support in a closed 
set, for example, support of the 5 — function is just one point 0. 

Note that for model 2 Table 4 gives the solution for 0^.* directly and the 
inverse Fourier transform can provide the (generalized) density function, f^* . 

In Zinde- Walsh (2012) identification conditions in S* were given for mod- 
els 1 and 7 under assumptions that include the ones in Table 3 but could 
also be more flexible. 

The equations in Table 3 for models 1,3, 4, 4a, 5, 6 and 7 are of two 
types, similar to those solved in Zinde- Walsh (2012). One is a convolution 
with one unknown function; the other is a system of equations with two 
unknown functions, each leading to the corresponding equations for their 
Fourier transforms. 

3.2.1 Solutions to the equation a/3 — 7. 

Consider the equation 

a/3 = 7, (5) 

with one unknown function a; /3 is a given continuous function. By assump- 
tion 2 the non-parametric class for a is such that the equation holds in S* on 
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R^] it is also possible to consider a nonparametric class for a with restricted 
support, W. Of course without any restrictions W = R^. Recall the differ- 
entiation operator, 9™, for m — (mi, ...771^) and denote by supp{f5,d) the 
set \J'^j^.^()Supp{d^ /3); where supp{d^/3) is an open set where a continuous 
derivative d^P exists. Any point where /3 is zero belongs to this set if some 
finite-order partial continuous derivative of (3 is not zero at that point (and 
in some open neighborhood); for (3 itself supp{f3) = supp{f3, 0). 
Define the functions 

J 1 ior X E supp(/3,d); 
ai^ (5 7/ (supp(p, d)) ; a2(x) = <j (6) 

I a ior X E W\{supp{/3,d)) 

with any a such that aia2 G Ft (A) . 

Consider the case when a, (3 and thus 7 are continuous. For any point xo 
if (3{xo) 7^ 0, there is a neighborhood A''(a:o) where /3 7^ 0, and division by /3 
is possible. If f3{xo) has a zero, it could only be of finite order and in some 
neighborhood, A^(a:o) G supp{d'^j3) a representation 

13 = r7(x)nti {xi - xoir (7) 



holds for some continuous function rj in S*, such that 77 > c,, > on 
supper]) .Then i]^^'~f in A^(.'Eo) is a non-zero continuous function; division of 
such a function by Iif=i {xi — Xoi)™* in S* is defined (Schwartz, 1967, pp. 
125-126), thus division by /3 is defined in this neighborhood N{xo). For the 
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set supp{P, d) consider a covering of every point by such neighborhoods, the 
possibihty of division in each neighborhood leads to possibihty of division 
globally on the whole supp{j3, d). Then oi as defined in ([H]) exists in S* . 

In the case where 7 is an arbitrary generalized function, if /3 is infinitely 
different iable then then by (Schwartz, 1967, pp. 126-127) division by (3 is 
defined on supp{(3, d) and the solution is given by (jS]) • 

For the cases where 7 is not continuous and (3 is not infinitely differen- 
tiable the solution is provided by 



with any a such that aia2 G Ft [A) . 

Theorem 2 in Zinde-Walsh (2012) implies that the solution to ([5]) is 
a = Ft~^{aia2); the sufficient condition for the solution to be unique is 
supp{f3, 0) D W; if additionally either 7 is a continuous function or f3 is an 
infinitely continuously differentiable function it is sufficient for uniqueness 
that supp{(3, d) D W. 

This provides solutions for models 1 and 6 where only equations of this 
type appear. 
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3.2.2 Solutions to the system of equations 

For models 3,4,5 and 7 a system of equations of the form 



a/3 



= 7; 

7fc, 



., d. 



(8) 



(with P continuously differentiable) arises. Theorem 3 in Zinde- Walsh (2012) 
provides the solution and uniqueness conditions for this system of equations. 
It is first established that a set of continuous functions x^, A; = 1, ...,d, that 
solves the equation 

>^kj - 7fc = (9) 

in the space S* exists and is unique onW = suppi^) as long as supp{f3) D W. 
Then = and substitution into Q) leads to a system of first-order 

differential equations in /3. 

Case 1. Continuous functions; W is an open set. 

For the models 3 and 4 the system ([8]) involves continuous characteristic 
functions thus there W is an open set. In some cases W can be an open 
set under conditions of models 5 and 7, e.g. if the regression function is 
integrable in model 7. 

For this case represent the open set W as a union of (maximal) connected 
components U^W^. 

Then by the same arguments as in the proof of Theorem 3 in Zinde- 
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Walsh (2012) the solution can be given uniquely on as long as at some 

point Coi) ^ n W) the value {(q^) is known for each of the connected 

d 

components . Consider then ^i(C) = (Coi/) Jr ^ ^k{0^^]H^i^)-> 

° k=l 

where integration is along any arc within the component that connects C to 
Cq^. Then ai = /3]~^7, and a2,/32 are defined as above by being 1 on UyWy 
and arbitrary outside of this set. 

When /3(0) = 1 as is the case for the characteristic function, the function 
is uniquely determined on the connected component that includes 0. 

Evdokimov and White (2012) provide a construction that permits in the 
univariate case to extend the solution /3 ((q^) [exp ^ >Ck(^)d^]I(Wi,) from 

° k=l 

a connected component of support where (3 (Cou) known (e.g. at for a 
characteristic function) to a contiguous connected component when on the 
border between the two where P — 0, at least some finite order derivative of 
/3 is not zero. In the multivariate case this approach can be extended to the 
same construction along a one- dimensional arc from one connected compo- 
nent to the other. Thus identification is possible on a connected component 
of supp{(3, d). 

Case 2. is a closed set. 

Generally for models 5 and 7, W is the support of a generalized function 
and is a closed set. It may intersect with several connected components of 
support of p. Denote by W^, here the intersection of a connected component 
ofsupport of/5 and W^. Then similarly /5i(C) = E[/5 (Co.) exp E >CkiOdmW.), 

V ° k=\ 

where integration is along any arc within the component that connects C, to 
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Cqj^. Then ai = (3^^e, and 02, /?2 defined as above by being 1 on U^W^ 
and arbitrary outside of tliis set. Tlie issue of the value of (3 at some point 
within each connected component arises. In the case of (3 being a character- 
istic function if there is only one connected component, W and & W the 
solution is unique, since then /3(0) = 1. 

Note that for model 5 the solution to equations of the type ([HD would 
only provide Ft{gfx*) and 0^; then from the first equation for this model in 
Table 4 0^, can be obtained; it is unique if suppcf)^* =supp0^. To solve for g 
Rndg = Ft-' {Ft (gU)) ■ {Uy'. 

4 Identification, partial identification and well- 
posedness 

4.1 Identified solutions for the models 1-7 

As follows from the discussion of the solutions uniqueness in models 1,2, 3, 4, 4a, 5, 6 
holds (in a few cases up to a value of a function at a point) if all the Fourier 
transforms are supported over the whole R^; in many cases it is sufficient 
that supp{l3, d) = R^. 

The classes of functions could be defined with Fourier transforms sup- 
ported on some known subset W of rather than on the whole space; if 
all the functions considered have W as their support, and the support con- 
sists of one connected component that includes as an interior point then 
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identification for the solutions holds. For the next table assume that is a 
single connected component with as an interior point; again W could coin- 
cide with supp{f5, d). For model 5 under Assumption B assume additionally 
that the value at zero: Ft{gfx*){0) is known; similarly for model 7 under 
assumption B additionally assume that Ft{g){0) is known. 
Table 5. The solutions for identified models on W. 
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Mndpl 


Solution to 
equations 


1. 




2. 






Under Assumption A 




° k=l 




where solves x^^^ - [(^2)^ - £jt] = 0; 


3. 


Under Assumption B 

/„ = FrHexp/^ f^^m); 

° fe=l 

Xfc solves Xfc^^ - = 0; 


4 


fx*, fu obtained similarly to those in 3.; 


4a. 


fuxj fu obtained similarly to 0^*, 0„ in 3.; 




Three steps: 




1. (a) Get Ft{gfx*), (t>u similarly to 4>^*, 4>^ in model 3 


5. 


(under Assumption A use Ft{gfx*){0)); 
2. Obtain (j)^. = ^'V^; 
3. Get g = [Ft-' (0,,)]"' Ft-\Ft{gf,,)). 


6. 


<f>-u = CVx and g = Ft-'{(l)-'(l)^e). 


7. 


(f)^*, Ft(5f)obtained similarly to (f)^*, ^^in 3 
(under Assumption A use Ft{g){0)). 



4.2 Implications of partial identification. 

Consider the case of Model 1. Essentially lack of identification, say in the 
case when the error distribution has characteristic function supported on 
a convex domain Wu around zero results in the solution for = (j)i4'2, 
with (pi non-zero and unique on W^, and thus captures the lower- frequency 
components of x*, and with 02 is a characteristic function of a distribution 
with arbitrary high frequency components. Transforming back to densities 
provides a corresponding model with independent components 

Z — xl + X2 + U, 

where x^ uniquely extracts the lower frequency part of observed z. The more 
important the contribution of xl to x* the less important is lack of identifi- 
cation. 

If the feature of interest as discussed e.g. by Matzkin (2007) involves only 
low frequency components of x*, it may still be fully identified even when 
the distribution for x* itself is not. An example of that is a deconvolution 
applied to an image of a car captured by a traffic camera; although even after 
deconvolution the image may still appear blurry the licence plate number may 
be clearly visible. In nonparametric regression the polynomial growth of the 
regression or the expectation of the response function may be identifiable 
even if the regression function is not fully identified. 

Features that are identified include any functional, linear or non-linear 
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on a class of functions of interest, such that in the frequency domain $ is 
supported on Wu- 

4.3 Well-posedness in S* 

Conditions for well-posedness in S* for solutions of the equations entering in 
models 1-7 were established in Zinde- Walsh (2012). Well-posedness is needed 
to ensure that if a sequence of functions converges (in the topology of S*) to 
the known functions of the equations characterizing the models 1-7 in tables 
1 and 2, then the corresponding sequence of solutions will converge to the 
solution for the limit functions. A feature of well-posedness in S* is that the 
solutions are considered in a class of functions that is a bounded set in S*. 

The properties that differentiation is a continuous operation, and that 
the Fourier transform is an isomorphism of the topological space S* , make 
conditions for convergence in this space much weaker than those in functions 
spaces, say, Li, L2. Thus for density that is given by the generalized deriva- 
tive of the distribution function well-posedness holds in spaces of generalized 
functions by the continuity of the differentiation operator. 

For the problems here however, well-posedness does not always obtain. 
The main sufficient condition is that the inverse of the characteristic function 
of the measurement error satisfy the condition ([T]) with b = (j)^^ on the 
corresponding support. This holds if either the support is bounded or if 
the distribution is not super-smooth. If has some zeros but satisfies the 
identification conditions so that it has local representation ([7]) where ([T]) is 
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satisfied for b = r]~^ well-posedness will hold. 

Example in Zinde- Walsh (2012) demonstrates that well-posedness of de- 
convolution will not hold even in the weak topology of S* for super-smooth 
(e.g. Gaussian) distributions on unbounded support. On the other hand, 
well-posedness of deconvolution in S* obtains for ordinary smooth distribu- 
tions and thus under less restrictive conditions than in function spaces, such 
as Li or L2 usually considered. 

In the models 3-7 with several unknown functions, more conditions are 
required to ensure that all the operations by which the solutions are obtained 
are continuous in the topology of 5"*. It may not be sufficient to assume ([1]) 
for the inverses of unknown functions where the solution requires division; 
for continuity of the solution the condition may need to apply uniformly. 

Define a class of ordinary functions on R^, $(m, V) (with m a vector of 
integers, V a positive constant) where b G <l>(m, V) if 



Then in Zinde- Walsh (2012) well-posedness is proved for model 7 as long 
as in addition to Assumption A or B, for some $(m, 1/) both cj)^ and 
belong to the class $(m, V). This condition is fulfilled by non- super smooth 

this could be an ordinary smooth distribution or a mixture with some 
mass point. 

A convenient way of imposing well-posedness is to restrict the support of 




(10) 
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functions considered to a bounded W. If the features of interest are associated 
with low- frequency components only, then if the functions are restricted to 
a bounded space the low-frequency part can be identified and is well-posed. 

5 Implications for estimation 

5.1 Plug-in non-parametric estimation 

Solutions in Table 5 for the equations that express the unknown functions 
via known functions of observables give scope for plug- in estimation. As seen 
e.g. in the example of Model 4, 4 and 4a are different expressions that will 
provide different plug-in estimators for the same functions. 

The functions of the observables here are characteristic functions and 
Fourier transforms of density-weighted conditional expectations and in some 
cases their derivatives, that can be estimated by non-parametric methods. 
There are some direct estimators, e.g. for characteristic functions. In the 
space S* the Fourier transform and inverse Fourier transform are continuous 
operations thus using standard estimators of density weighted expectations 
and applying the Fourier transform would provide consistency in S*; the 
details are provided in Zinde- Walsh (2012). Then the solutions can be ex- 
pressed via those estimators by the operations from Table 5 and, as long 
as the problem is well-posed, the estimators will be consistent and the con- 
vergence will obtain at the appropriate rate. As in An and Hu (2012), the 
convergence rate may be even faster for well-posed problems in S* than the 

49 



usual nonparametric rate in (ordinary) function spaces. For example, as 
demonstrated in Zinde- Walsh (2008) kernel estimators of density that may 
diverge if the distribution function is not absolutely continuous, are always 
(under the usual assumptions on kernel/bandwidth) consistent in the weak 
topology of the space of generalized functions, where the density problem is 
well-posed. Here, well-posedness holds for deconvolution as long as the error 
density is not super-smooth. 

5.2 Regularization in plug-in estimation 

When well-posedness cannot be ensured, plug-in estimation will not provide 
consistent results and some regularization is required; usually spectral cut-off 
is employed for the problems considered here. In the context of these non- 
parametric models regularization requires extra information: the knowledge 
of the rate of decay of the Fourier transform of some of the functions. 

For model 1 this is not a problem since 0„ is assumed known; the regular- 
ization uses the information about the decay of this characteristic function to 
construct a sequence of compactly supported solutions with support increas- 
ing at a corresponding rate. In S* no regularization is required for plug-in 
estimation unless the error distribution is super-smooth. Exponential growth 
in provides a logarithmic rate of convergence in function classes for the 
estimator (Fan, 1991). Below we examine spectral cut-off regularization for 
the deconvolution in S* when the error density is super-smooth. 

With super-smooth error in S* define a class of generalized functions 
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$(A, m, V) for some non-negative-valued function A; a generalized function 
b e $(A,m, if there exists a function b{() E $(m, such that also 
6(C)~^ e $(m, V) and b = b{Q exp (— A(C)) . Note that a hnear combination 
of functions in $(A, m, V) belongs to the same class. Define convergence: a 
sequence of 6„ G $(A, m, V) converges to zero if the corresponding sequence 
bn converges to zero in S*. 

Convergence in probability for a sequence of random functions, £„, in S* is 
defined as follows: (e„ — e) — )-p in 5** if for any set i/ji, E S the random 

vector of the values of the functionals converges: ((£„ — e, -0^), (£„ — £, ip^)) -^p 
0. 

Lemma 2. // in model 1 (p^ = b E $(A, m, V), where A is a polynomial 
function of order no more than k, and is a sequence of estimators of e that 

are consistent in S* : rn{en — s) — >p in S* at some rate r„ — > oo, then for 

- - i 

any sequence of constants Bn < Bn < (lnr„)'= and the corresponding set 

Bn = {C '■ IICII < Bn} the sequence of regularized estimators (f)~^{en — £)I{Bn) 

converges to zero in probability in S*. 

Proof. For n the value of the random functional 

{cj>-\en-e)I{Bn),i^) = J b-\C)rn{en-e)r-'l{Bn)exp{A{C))mdC. 

Multiplication by b~^ e $(m, V), that corresponds to (j)^ — b does not affect 
convergence thus b~^{Qrn{£n — converges to zero in probability in S*. To 
show that {(j)~^{en — e)I{Bn),ip) converges to zero it is sufficient to show that 
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the function r~^I{Bn) exp (A(C)) ^'(C) is bounded. It is then sufficient to find 
Bn such that r~^/(_B„) exp (A(C)) is bounded (by possibly a polynomial), 
thus it is sufficient that sup |exp (A(C)) r~^| be bounded. This will hold if 
exp(S^) <r„, <lnr„.B 

Of course an even slower growth for spectral cut-off would result from A 
that grows faster than a polynomial. The consequence of the slow growth 
of the support is usually a correspondingly slow rate of convergence for 
(p^^enI{Bn)- Additional conditions (as in function spaces) are needed for 
the regularized estimators to converge to the true 7. 

It may be advantageous to focus on lower frequency components and 
ignore the contribution from high frequencies when the features of interest 
depend on the contribution at low frequency. 

6 Concluding remarks 

Working in spaces of generalized functions extends the results on nonpara- 
metric identification and well-posedness for a wide class of models. Here 
identification in deconvolution is extended to generalized densities in the 
class of all distributions from the usually considered classes of integrable den- 
sity functions. In regression with Berkson error nonparametric identification 
in 5** holds for functions of polynomial growth, extending the usual results 
obtained in Li, a similar extension applies to regression with measurement 
error and Berkson type measurement; this allows to consider binary choice 
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and polynomial regression models. Also, identification in models with sum- 
of-peaks regression function that cannot be represented in function spaces 
is included. Well-posedness results in S* also extend the results in the lit- 
erature provided in function spaces; well-posedness of deconvolution holds 
as long as the characteristic function of the error distribution does not go 
to zero at infinity too fast (as e.g. super-smooth) and a similar condition 
provides well-posedness in the other models considered here. 

Further investigation of the properties of estimators in spaces of general- 
ized functions requires deriving the generalized limit process for the function 
being estimated and investigating when it can be described as a generalized 
Gaussian process. A generahzed Gaussian hmit process holds for kernel esti- 
mator of the generalized density function (Zinde- Walsh, 2008). Determining 
the properties of inference based on the limit process for generalized ran- 
dom functions requires both further theoretical development and simulations 
evidence. 
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